Learn R Programming

epiDisplay (version 3.5.0.2)

Data for cleaning: Dataset for practicing cleaning, labelling and recoding

Description

The data come from clients of a family planning clinic.

For all variables except id: 9, 99, 99.9, 888, 999 represent missing values

Usage

data(Planning)

Arguments

Format

A data frame with 251 observations on the following 11 variables.

ID

a numeric vector: ID code

AGE

a numeric vector

RELIG

a numeric vector: Religion

lll 1 = Buddhist 2 = Muslim

PED

a numeric vector: Patient's education level

lll 1 = none 2 = primary school 3 = secondary school 4 = high school 5 = vocational school 6 = university 7 = other

INCOME

a numeric vector: Monthly income in Thai Baht

lll 1 = nil 2 = < 1,000 3 = 1,000-4,999 4 = 5,000-9,999 5 = 10,000

AM

a numeric vector: Age at marriage

REASON

a numeric vector: Reason for family planning

lll 1 = birth spacing 2 = enough children 3 = other

BPS

a numeric vector: systolic blood pressure

BPD

a numeric vector: diastolic blood pressure

WT

a numeric vector: weight (Kg)

HT

a numeric vector: height (cm)

Examples

Run this code
# NOT RUN {
data(Planning)
des(Planning)

# Change var. name to lowercase
names(Planning) <- tolower(names(Planning)) 
.data <- Planning
des(.data)
# Check for duplication of 'id'
attach(.data)
any(duplicated(id))
duplicated(id)
id[duplicated(id)] #215

# Which one(s) are missing?
setdiff(min(id):max(id), id) # 216

# Correct the wrong on
id[duplicated(id)] <- 216
detach(.data)
rm(list=ls())
# }

Run the code above in your browser using DataLab